An Incremental Subspace Learning Algorithm to Categorize Large Scale Text Data

نویسندگان

  • Jun Yan
  • QianSheng Cheng
  • Qiang Yang
  • Benyu Zhang
چکیده

The dramatic growth in the number and size of on-line information sources has fueled increasing research interest in the incremental subspace learning problem. In this paper, we propose an incremental supervised subspace learning algorithm, called Incremental Inter-class Scatter (IIS) algorithm. Unlike traditional batch learners, IIS learns from a stream of training data, not a set. IIS overcomes the inherent problem of some other incremental operations such as Incremental Principal Component Analysis (PCA) and Incremental Linear Discriminant Analysis (LDA). The experimental results on the synthetic datasets show that IIS performs as well as LDA and is more robust against noise. In addition, the experiments on the Reuters Corpus Volume 1 (RCV1) dataset show that IIS outperforms state-of-the-art Incremental Principal Component Analysis (IPCA) algorithm, a related algorithm, and Information Gain in efficiency and effectiveness respectively.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

An Incremental DC Algorithm for the Minimum Sum-of-Squares Clustering

Here, an algorithm is presented for solving the minimum sum-of-squares clustering problems using their difference of convex representations. The proposed algorithm is based on an incremental approach and applies the well known DC algorithm at each iteration. The proposed algorithm is tested and compared with other clustering algorithms using large real world data sets.

متن کامل

Image retrieval based on incremental subspace learning

Many problems in information processing involve some form of dimensionality reduction, such as face recognition, image/text retrieval, data visualization, etc. The typical linear dimensionality reduction algorithms include principal component analysis (PCA), random projection, locality-preserving projection (LPP), etc. These techniques are generally unsupervised which allows them to model data ...

متن کامل

Incremental Subspace Data-mining Algorithm Based on Data-flow Density of Complex Network

In order to improve the accuracy of data-mining in large-scale complex networks, an incremental subspace data-mining algorithm based on data-flow density of complex network is proposed in this paper. In order to accommodate this goal, a latent variable model is first introduced and incorporated into Data-flow density model so that the network is divided into different communities and the defect...

متن کامل

A TWO-STAGE METHOD FOR DAMAGE DETECTION OF LARGE-SCALE STRUCTURES

A novel two-stage algorithm for detection of damages in large-scale structures under static loads is presented. The technique utilizes the vector of response change (VRC) and sensitivities of responses with respect to the elemental damage parameters (RSEs). It is shown that VRC approximately lies in the subspace spanned by RSEs corresponding to the damaged elements. The property is leveraged in...

متن کامل

Research on Incremental Learning Method Based on Support Vector Machine Method

An incremental learning algorithm based on support vector machine was proposed to process large-scale data or data generated in batches. Initial goal concept learnt by standard support vector machine algorithm was updated by an updating model. Compared with the existing incremental learning algorithms, this algorithm can achieve the incremental inverse process and the training time is in invers...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2005